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ABSTRACT 

The report reviews competency, tests designed to 
evaluate inservice and prospective teachers. The most frequently used 
instrument is the National Teacher Examination (NTE). It is designed 
to measure academic preparation in four domains: communication 
skills, general" education , professional education, and subject field 
specialization, and should be used, if at all, only as one part of 
^ initial teacher selection process. The Dallas Independent School 
Di:.i;rict has found that the Wesman Personnel Classification Test, a 
measure of verbal and quantitative ability, is at least as good a 
predictor of rated teaching effectiveness as the NTE. Other factors 
which a school district must consider in planning a testing 
requirement are legal issues, minorities' scores, and state 
requirements for teacher certification. The Teacher Perceiver, a 
structured interview, and psychological testing have been considered 
as posible components of the teacher selection process. It is 
concluded that no single method of teacher selection is a panacea. 
All sources of information have some merit in the selection process. 
However, no single source should be relied upon to the exclusion of 
others. (DWH) 
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NO PANACEAS: A BRIEF DISCUSSION OF TEACHER SELECTION INSTRUMENTS 

Summary 



The widespread percepticns that public schools are doing a poor job and that 
many teachers are themselves deficient in basic academic skills hav^: caused 
many states and localities to turn to competency tests to evaluate their 
teachers and/or applicants* Currently, 30 states require or have set a date 
to require applicants for teacher certification to be tested for competency 
in some combination of basic skills, subject matter knowledge, or pedagogical 
knowledge. 

The test battery in widest use nationwide is the National Teacher Examinations 
(NTE), which is now required by more than 200 school districts and eight 
states. The NTE is designed to measure academic preparation in four domains: 
communication skills, general education, professional education, and subject 
field specialization. 

There are no validation data available for the most recent version of the 
recently revamped NTE; the previous version, however, had been in use for 
some 40 years without proven predictive validity for identifying good 
teachers. The test's publishers recommend that it be used only as one of 
many ctiterla for initial selection of teachers, and that it not be used in 
any way with inservice teachers. In our opinion, it may have value only as 
an indicator of general literacy; vigilance against over-interpreting the 
scores is essential. 

The Dallas Independent School District has found that the Wesman Personnel 
Classification Test, a 28-minute measure of verbal and quantitative ability, 
is at least as good a predictor of rated teaching effectiveness as the NTE. 
Wesman Verbal scores were also related to achievement gains among secondary 
students • 

Other factors to be considered in planning a possible testing requirement 
are legal issues, minorities* scores, and state requirements. 

o In general, the key issues in court challenges to the NTE (and 
this probably applies to any test) appear to be (1) Intentional 
racial discrimination, and (2) the content validity of the test. 
Local validation studies would be required before using any test, 

0 Minorities score substantially lower than Anglos on both the NTE 
and thai Wesman, as on most standardized tests. This could lead 
to conflict with the District's affirmative action committment. 

o The State of Texas will require a test for teacher certification 
beginning in 1986. The State is just beginning to require a test 
(the Pre-Professional Skills Test, PPST) for college students 
entering teacher certification programs. 
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We conclude that there is proba^l.y little for AISD to gain by testing 
teachers, considering the State's 1986 requirement • If testing is to be 
done, a brief test of verbal and quantitative skills such as the Wesman 
may be more appropriate to use as an indicator of basic literacy. 

Psychological testing and the Teacher Perceiver, a structured interview, 
have also been considered as possible components of the teacher selection 
process. (The latter is already used in AISD.) It is concluded that 
psychological testing probably has no potential usefulness for the District, 
while the Teacher Perceiver requires additional local validation. 
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NO PANACEAS: A BRIEF DISCUSSION OF TEACHER SELECTION INSTRUMENTS 



I. The Problem: Teacher Competence 

The widespread perceptions that public schools are doing a poor job and 
that many teachers are themselves deficient in basic academic skills have 
caused many states and localities to turn to competency testing to evaluate 
their teachers and/or applicants. Currently, 30 states require or have set a 
date to require applicants for teacher certification to be tested for com- 
petency either in the basic skills, subject matter knowledge, pedagogical 
knowledge, or some combination of these (Euchner 1984). 

A need does indeed appear to exist. Consider: 

- In 1978, the Dallas Independent School District gave the Wesman 
Personnel Classification Test of basic skills to 535 first-year 
teachers and to a volunteer group of juniors and seniors from a 
private high school in the area. Not only did the students out- 
perform the teachers, but more than half the teachers fell below the 
score considered acceptable by the district (R. Mitchell 1978). 

- Among 12 groups of college majors listed in a report in the 
Journal of Teacher Education (Weaver 1981), education majors 
nationwide had the lowest SAT Verbal and Math scores. 

- A report from Educational Testing Service (Henderson 1982) 
showed that when high school seniors and college students were 
tested on their knowledge of international affairs, education 
majors scored lower than any other group. 

- In New Mexico, it was reported that none of the state's 136 bi- 
lingual teachers could pass a fourth-grade level Spanish tVst 
(Crewdson 1979). 

II. One Response: Competency Tests 

The State of Texas has responded to the problem by requiring that prospective 
teachers pass the Pre-Prof essional Skills Tests (PPST) before they are admit- 
ted to an education major, and that beginning in 1986 they must pass an exit 
test, as yet unspecified, before they are actually certified as teachers. 

The test battery in widest use.for certification purposes nationwide is 
the National Teacher Examinations (NTE) , published by Educational Testing 
Service and required by more than 200 school districts (and eight states). 
The NTE measures academic preparation in four major domains: communication 
skills, general education, professional education, and subject-field spe- 
cialization. The first three of these are measured by the Core Battery, 
the last by the 28 Specialty Area Tests. 
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The Core Battery consists of three components: 



(1) The Test of Communication Skills (listening comprehension, 
reading comprehension, writing); 

(2) The Test of General Knowledge (literature, fine arts, mathematics, 
science, and social studies); 

(3) The Test of Professional Knowledge (processes and context of 
teaching) • 

The Speciality Area Tests measure knowledge of the specific subjects in 
which the candidates have concentrated and which they intend to teach. 

Validity of the NTE , The NTE has recently been revamped; no validation 
data are available for the revised test. The previous version, however, 
had been in use for some 40 years without showing much ability to identify 
good teachers (which, we should stress, ETS has never claimed it could do 
in either its former or present version) . 

A research review published by ETS and cited by the Mental Measurements 
Yearbook (Buros 1978) found a median correlation of .11 for seven studies 
correlating a composite NTE score and ratings given by principals and 
supervisors during teachers' first year, and .10 during the third year. 
Another review (Shields and Daniele 1982) found that NTE scores correlated 
-.01 to .04 with student teaching grades. Although Piper and 0' Sullivan 
(1981) reported a moderate correlation between NTE scores of preservice 
teachers and observational ratings, it is difficult to assess the\orth of 
this study because the reliability and validity of the criterion measure 
used are unknown. In sum, we concur with the reviewer who concluded: 

...There can be no shadow of a doubt that the NTE [scores] 
are grossly misused If they are used in any way to predict 
classroom teaching effectiveness as conventionally measured . . . 
The grave danger is that this type of quantitative information 
is so handy, seemingly concrete, and beguiling that it will 
receive more emphasis than its validity deserves, and abuses 
damaging to careers and human beings will result (J. Mitchell 
1978, p. 518. Emphasis in original). 

Valid uses of the NTE . ETS now reports scores for the tests of Communi- 
cations Skills, General Knowledge, and Professional Knowledge separately, 
and the Weighted Common Examination Total (WCET) is no longer reported. 
A study of the former version of the NTE, however, found that WCET scores 
correlated .77 with GRE Verbal scores (Johnson 1963) which suggests that 
the NTE and tests of verbal ability involve similar aptitudes. The 
Dallas Independent School District (Webster 1980) confirmed this when 
they obtained a correlation of .81 between the WCET and the Verbal section 
of the Wesman Personnel Classification Test. The NTE, then, is probably 
a valid test of general literacy. 
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ETS recommends that NTE tests be used as part of a process for selecting 
teachers for initial employment only when the district 

• Uses multiple criteria, including interviews, references, 
classroom observation, transcripts, and written applications; 

• Publicly promulgates selection criteria; 

• Carries out studies to establish the content validity of the 
tests for teachers in the local district . 

ETS strongly advises against giving the test to inservice teachers and 
states that 

NTE tests should not be used by school districts, directly or 
indirectly, to determine the compensation, retention, termina- 
tion, advancement, pay supplements, or change in provisional 
employment status of teachers once they are employed. 

(NTE Policy Council 1983) 

We believe that these guidelines are sound - NTE scores should be used, 
if at all, only as one part of the initial selection process; and they 
should not be considered a predictor of teaching ability, but only of 
general literacy. 

(Houston Independent School District, as part of the Second Mile Plan, , 
planned to require all teachers to achieve a certain score on the Pre- 
Professional Skills Tests (an ETS test battery closely related to the 
NTE) or be frozen on the salary schedule. ETS informed HISD that they 
would no longer make the tests available if HISD intended to use them 
for that purpose; HISD has responded by developing its own test.) 
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III. Dallas' Studies 

In the late 1970' s, the Dallas Independent School District undertook 
a program of research as part of an attempt to improve their teacher 
selection (Webster 1980). In 1977, all first-year teachers in DISD 
participated in a study comparing the NTE common score, the Wesman Per- 
sonnel Classification Test (a 28-minute test of verbal and quantitative 
ability), and a formally scored interview as predictors of effective 
teaching. 

The criterion variables were principals' ratings, residual mean class- 
room gain scores on the ITBS (controlling for pretest score) , and behavioral 
ratings. The behavior ratings used depended on the teacher's grade level. 
For secondary teachers, the Class Average Residualized Composite Score 
(CARCS) was calculated for each teacher, by obtaining students' ratings 
on a reliable 37-item weasure of teacher behavior based on Gagne and 
Briggs' (1974) theory of pedagogy and controlling for the effects of 
course subject matter, expected grade, halo effects, and student/teacher 
ethnic differential. They were also assessed by trained observers. 
The elementary teachers were measured by the observers only. 

7 lU 
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Important results from this study were: 

• The Wesman Verbal score correlated .81 with the NTE common score. 

• Neither the NTE, the Wesman, interviews, nor student ratings 
(CARCS) correlated with principals' evaluations. 

• Among secondary teachers, both NTE score and Wesman Verbal scores 
correlated .47 with student ratings (CARCS). The Wesman is also 
correlated with observers' ratings at both elementary and ^ 
secondary levels. 

• Interview scores had very low correlations with the Wesman, the 
NTE, and student ratings. 

• Wesman Verbal scores were a moderate predictor of achievement 
test gains at the middle and secondary school level but not 

at the elementary level. The failure to predict elementary ITBS 
scores may be caused by insufficient variance in ITBS scores at 
lower grade levels. 

In sum, the Wesman is at least as good a predictor of behavioral ratings 
and test score gains as the NTE, with which it is highly correlated; 
but while the NTE is administered by ETS, is expenisive for the examinee, 
takes several hours to complete, and takes several weeks for the scores to 
be reported, the Wesman takes 28 minutes and can be administered in the 
personnel office. (DISD now uses its own test, similar to 'the Wesman, for 
security reasons.) 

IV. Other Considerations 

Legality of testing . This is an area that should be studied very carefully 
by legal counsel for the District. In 1978 the United States Supreme Court 
summarily affirmed a federal district court's ruling that the State of 
South Carolina's use of the NTE to certify teachers was constitutional 
and that local districts' use of the tests for salary purposes was, though 
opposed by ETS, lawful under Title VII of the Civil Rights Act of 1964 
(NTE Policy Council 1983). The fact that the federal court's judgement 
was summarily affirmed, without benefit of briefs and oral arguments on 
the merits of the decision, means that it has full precedential weight only 
in the federal judicial circuit in which the case arose. 

In general ♦ the key issues in court challenges to the NTE appear to be 
(1) intentional racial discrimination and (2) the test's content validity. 
ETS emphasizes that this validity must be established locally, because 
it may differ across districts. The courts have also ruled against using 
test scores as the sole criterion of hiring. 

Minorities' scores . While the NTE itself has been judged by the 
courts to be nondiscriminatory, it is a fact that minorities score sub- 
stantially lower than Anglos; the same is true of the Wesman and similar 
tests. Adoption of test scores as a criterion of hiring, even if it is 
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only one of many, may conflict with the District's afrirmative action 
commitTient . DISD has attempted to over.come this by aggressive nationwide 
recruitment of minorities (R, Mitchell 1978) • 

State requirements ♦ The state o£ Texas will require, beginning in 1986, 
all prospective teachers to pass some kind of certification test (as yet 
unspecified) before being licensed. It may be unnecessary for AISD to 
do the same. The PPST is now required for college students entering a 
teacher certification program. 



V. Psychological Testing of Teachers 

The long search for psychological instruments which could differentiate 
good teachers from poor teachers has not been a fruitful one. This was 
a very popular and active research area at one time. The first Handbook 
of research on teaching devoted a long chapter to "The Teacher's Person- 
ality and Characteristics" (Getzels and Jackson 1963); tHe authors 
confined themselves to studies published Since 1950 and still had to sort 
through more than 800. The second version of the Handbook (Travers 1973) , 
however, barely mentioned the topic, and an ERIC search covering the years 
1980'-83 turned up very few relevant studies. 

One can easily see why interest in research attempting to relate personal- 
ity characteristics to teaching effectiveness has waned - it wasn't getting 
anywhere. There are plenty of statistically significant findings in the 
literature, but there appears to be no coherent pattern to them, and ^-^ 
very, very few have been replicateti or cross-validated. y 

Even the studies that have shown associations between one characteristic 
or another and good teaching do not inspire confidence that a measure of 
this characteritic can justifiably be used as part of an actual selection 
process. There are several reasons for this. First, as mentioned above, 
most findings have not been cross validated on a separate sample, a process 
which is always necessary to establish the validity of a selection instru- 
ment. Second, the criterion variable "teaching ability" is measured in 
differSnt ways from study to study and has itself proven to be a very 
difficult variable to measure validly. Most studies have used supervisor 
ratings, which seldom have documented validity. 

A third problem lies in the strength of the relationship between any 
single characteristic and teaching ability. If we obtain supervisor ratings 
for a group of teachers and separate them into a "good" group and a "poor" 
group (the usual method in these sorts of studies), then we administer 
an- instrument measuring Characteristic A and find a diff-erence in group 
means, we can say that the presence of Ch^act eristic A is predictive 
(in a statistical sense) of teaching ability. There will invariably be 
an overlap in scores, so that some poor teachers score higher than some 
good teachers. If the overlap is small, the instrument could have some 
value as a selection device, but our review of the literature uncovered 
no such instruments. The relationships found, even when statistically 
"significant," have not been large enough nor consistent enough to be of 
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practical use. In general, the best statement one could make is that 
the qualities of a good teacher are probably those of any mentally 
healthy person. 

This last statement suggests another possible goal of administering 
personality assessment instruments to teacher candidates: screening out 
the seriously disturbed. While there is no test that alone can do this, 
there are tests which could be used to indicate which candidates might 
warrant further professional psychological investigation. The difficulty 
here is that the rate of serious disturbance in a population of teacher 
candidates is so low that the efficiency of a screening instrument is 
impaired. 

Cronbach (1970, p. 538-539) gives an example of this problem, using for 
illustration the Minnesota Multiphasic Personality Inventory (MMPI), the 
most popular instrument used for initial assessment by clinicians. The 
mean score on the Depression scale of the MMPI is 50. If a psychologist 
classifies anyone scoring above 70 as depressed, he or she will be correct 
80% of the time — if^ depressed patients constitute half of the patients 
he or she sees. If depressives are only 20% of the population, the cutoff 
score must be 83 to get the same "hit" rate, but such a high cutoff would 
leave more than half of the depressives undetected. Among teaching candi- 
dates, the rate of serious mental illness is undoubted]^' very low — surely 
less than 1% — so in order not to miss anybody with real problems one must 
tolerate an extremely large number of false alarms. Of course, separating 
the false alarms from the hits would require costly professional evalua- 
tion. So the question here is really one of practicality: is it worth 
the time and expense required to screen out the small number of seriously 
disturbed people who have nevertheless become certified teachers and applied 
for a job in the District, and who wouldn't be screened out on any other 
grounds? 



VI. The Teacher Perceiver Interview 

The Teacher Perceiver Interview is a selection process developed by 
Selection Research Incorporated of Lincoln, Nebraska, involving a struc- 
tured interview with questions revolving around 12 "Teacher Themes." 
There are three versions, having 60, 24, and 12 items, respectively. The 
stated purpose of the Teacher Perceiver is to "identify the very best 
teaching talent." 

Miller, Clements, and Gardner (Note 1) assessed the documentation of the 
Teacher Perceiver Interviews provided by SRI; discussed the implementation 
of the Teacher Perceiver system in Chicago, Houston, and Austin; and 
interviewed practitioners and administrators in AISD, who compared the 
Teacher Perceiver Interview with more traditional methods of teacher selection 

Miller et al. are sharply critical of the documented validity of the 
Teacher Perceiver. This criticism rests on several grounds* First, no 
validation study has ever appeared in a tfefereed Journal. (In fact one 
"landmark" study cited by SRI could not be found at all.) Most of the 
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studies cited by SRI consist of doctoral dissertations done at. the Univer- 
sity of Nebraska and of unpublished field studies conducted under the aus- 
pices of specific school districts. Second, even for these studies the 
reported correlations with student or aJrainistrator rankings of the 
teachers do not compare favorably with correlations obtained between 
industrial interview procedures and job performance measures. Third, the 
criterion measures may themselves be of questionable validity. 

The discussion of the Teacher Perceiver's implementation in Chicago 
includes several quotations from a radio documentary broadcast on station 
WBBM in Chicago in Drcember, 1978. Professor Herbert Walberg of the 
University of Illinois said, 

It would be a real miracle in education to have an instrument like 
this that would have a gr^at deal of predictive validity .... 16, 000 
educational researchers in the United States have not been able 
to produce something like this. ... since the turn of the century* 

Maurice Esch, Dean of the College of Education at the University of 
Illinois: 

* ...it lacks adequate predictive validity. ... these are unproven 
claims .... 

Dr. Bernard McKenna, National Education Association: 

What they are asking is people to express themselves 
about their very deep convictions and very complex 
psychological things on almost spur of the moment 
answers, (jhis results in] giving high scores to some 
people who are very glib and quick and talk well off 
the top of their head. 

Miller et al. (Note 1) interviewed four AISD staff members concerning their 
opinions about the Teacher Perceiver: an elementary principal, a junior 
high principal, the director of staff personnel, and the coordinator of 
secondary mathematics. Both principals were "quite enthusiastic" about 
the Teacher Perceiver. Both used the Perceiver in conjunction with such 
objective indices as grade point average, and both believed it superior 
to the previous AISD interview procedure. 

The director of staff personnel was also pleased with the Teacher Perceiver, 
but the coordinator of secondary mathematics had a very different opinion. 
He found the training "worthless," the ratings of the "exemplary" taped 
interviews very subjective, and the trainers rigid. 

Miller et al. (Note 1) conclude that the empirical bases for claims of 
the validity of the Teacher Perceiver are weak and that there is no 
evidence that the system is predictive of good teaching. 

Carsrud, Young, Krus, Click, Gronlie, and Culver (Note 2) have recently 
reported the results of a validation study of the Teacher Perceiver 



ERiC 



83.43 



conducted vjithin AISD* They studied 27 special education teachers who had 
been interviewed with the Teacher Perceiver before employment and who 
had subsequently received performance evaluations, and found that only 
two of the 12 Theme scores (Empathy and Individual Perception) were pre- 
dictive of success. The multiple R (an indicator of the strength of the 
relationship between the predictors and the criterion) for these two 
variables was high, however, providing some encouragement. Carsrud et al. 
do not report whether the criterion ratings and the interview ratings were 
made by different raters; if so, the results are stronger. Further valida- 
tion is needed in any case, however. 

On balance, it is difficult to make a conclusion about the Teacher Percei- 
ver 's validity. Until more evidence is gathered in AISD, it is probably 
best to follow Miller et al.'s (Note I) recommendation and not use the in- 
strument on any but an experimental basis. 



VII. Conclusion 

It may seem that this report has been very negative concerning various 
methods of selecting competent teachers. We should point out that in 
a sense the dice are loaded against finding any selection process to be 
effective, because in most studies the people who fail the process are 
no longer evaluated — they do not get hired. This is like trying to 
judge the relationship between height and basketball ability by correla- 
ting height with success among professional players. There is probably 
no correlation at that level, but all the players had to be tall enough to 
become professionals in the first place, so the true relationship is 
masked.* A "restricted range" on a variable usually reduces statistical 
correlations. 

Our real conclusion is not that current methods of teacher selection are 
inadequate, but rather than no single method is a panacea. All sources 
of information — interviews, grades, test scores of verbal ability, letters 
of recommendation, student teaching evaluations, and written applications — 
have value. No one source should be relied upon to the exclusion of others. 
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